480. Sliding Window Median
Hard

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle values.

  • For examples, if arr = [2,3,4], the median is 3.
  • For examples, if arr = [1,2,3,4], the median is (2 + 3) / 2 = 2.5.

You are given an integer array nums and an integer k. There is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position.

Return the median array for each window in the original array. Answers within 10-5 of the actual value will be accepted.

 

Example 1:

Input: nums = [1,3,-1,-3,5,3,6,7], k = 3
Output: [1.00000,-1.00000,-1.00000,3.00000,5.00000,6.00000]
Explanation: 
Window position                Median
---------------                -----
[1  3  -1] -3  5  3  6  7        1
 1 [3  -1  -3] 5  3  6  7       -1
 1  3 [-1  -3  5] 3  6  7       -1
 1  3  -1 [-3  5  3] 6  7        3
 1  3  -1  -3 [5  3  6] 7        5
 1  3  -1  -3  5 [3  6  7]       6

Example 2:

Input: nums = [1,2,3,4,2,3,1,4,2], k = 3
Output: [2.00000,3.00000,3.00000,3.00000,2.00000,3.00000,2.00000]

 

Constraints:

  • 1 <= k <= nums.length <= 105
  • 231 <= nums[i] <= 231 - 1
Accepted
79,105
Submissions
200,627
Seen this question in a real interview before?
Companies
Related Topics
Similar Questions
Show Hint 1
The simplest of solutions comes from the basic idea of finding the median given a set of numbers. We know that by definition, a median is the center element (or an average of the two center elements). Given an unsorted list of numbers, how do we find the median element? If you know the answer to this question, can we extend this idea to every sliding window that we come across in the array?
Show Hint 2
Is there a better way to do what we are doing in the above hint? Don't you think there is duplication of calculation being done there? Is there some sort of optimization that we can do to achieve the same result? This approach is merely a modification of the basic approach except that it simply reduces duplication of calculations once done.
Show Hint 3
The third line of thought is also based on this same idea but achieving the result in a different way. We obviously need the window to be sorted for us to be able to find the median. Is there a data-structure out there that we can use (in one or more quantities) to obtain the median element extremely fast, say O(1) time while having the ability to perform the other operations fairly efficiently as well?

Average Rating: 4.15 (41 votes)

Premium

A word of advice

This problem is a companion problem to 295. Find Median From Data Stream. This means that a lot of approaches to solve this problem are based on the methods to solve 295. Find Median From Data Stream. Perhaps try that problem before you approach this one.

Solution


Approach 1: Simple Sorting

Intuition

Do what the question says.

Algorithm

Store the numbers in a window container of size kk. The following operations must take place:

  1. Inserting the incoming element.
  2. Deleting the outgoing element.
  3. Sorting the window to find the medians.

One primitive approach is to copy kk consecutive elements from the input to the window and keep sorting these every time. This constitutes duplication of effort.

We can do a bit better if we instead insert and delete one element per window shift. The challenge then is to maintain the window as sorted, before and after the insert and delete operations.

Python comes with an excellent bisect module to help perform efficient insert operations on lists while maintaining their sorted property.

Complexity Analysis

  • Time complexity: O(nklogk)O(n \cdot k \log k) to O(nk)O(n \cdot k).

    • Copying elements into the container takes about O(k)O(k) time each. This happens about (nk)(n-k) times.

    • Sorting for each of the (nk)(n-k) sliding window instances takes about O(klogk)O(k \log k) time each.

    • Bisected insertion or deletion takes about O(logk)O(\log k) for searching and O(k)O(k) for actual shifting of elements. This takes place about nn times.

  • Space complexity: O(k)O(k) extra linear space for the window container.


Approach 2: Two Heaps (Lazy Removal)

Intuition

The idea is the same as Approach 3 from 295. Find Median From Data Stream. The only additional requirement is removing the outgoing elements from the window.

Since the window elements are stored in heaps, deleting elements that are not at the top of the heaps is a pain.

Some languages (like Java) provide implementations of the PriorityQueue class that allow for removing arbitrarily placed elements. Generally, using such features is not efficient nor is their portability assured.

Assuming that only the tops of heaps (and by extension the PriorityQueue class) are accessible, we need to find a way to efficiently invalidate and remove elements that are moving out of the sliding window.

At this point, an important thing to notice is the fact that if the two heaps are balanced, only the top of the heaps are actually needed to find the medians. This means that as long as we can somehow keep the heaps balanced, we could also keep some extraneous elements.

Thus, we can use hash-tables to keep track of invalidated elements. Once they reach the heap tops, we remove them from the heaps. This is the lazy removal technique.

An immediate challenge at this point is balancing the heaps while keeping extraneous elements. This is done by actually moving some elements to the heap which has extraneous elements, from the other heap. This cancels out the effect of having extraneous elements and maintains the invariant that the heaps are balanced.

NOTE: When we talk about keeping the heaps balanced, we are not referring to the actual heap sizes. We are only concerned with valid elements and hence when we talk about balancing heaps, we are referring to count of such elements.

Algorithm

  • Two priority queues:

    1. A max-heap lo to store the smaller half of the numbers
    2. A min-heap hi to store the larger half of the numbers
  • A hash-map or hash-table hash_table for keeping track of invalid numbers. It holds the count of the occurrences of all such numbers that have been invalidated and yet remain in the heaps.

  • The max-heap lo is allowed to store, at worst, one more element more than the min-heap hi. Hence if we have processed kk elements:

    • If k=2n+1(nZ)k = 2 \cdot n + 1 \quad (\forall \, n \in \mathbb{Z}), then lo is allowed to hold n+1n+1 elements, while hi can hold nn elements.
    • If k=2n(nZ)k = 2 \cdot n \quad (\forall \, n \in \mathbb{Z}), then both heaps are balanced and hold nn elements each.

    This gives us the nice property that when the heaps are perfectly balanced, the median can be derived from the tops of both heaps. Otherwise, the top of the max-heap lo holds the legitimate median.

NOTE: As mentioned before, when we are talking about keeping the heaps balanced, the actual sizes of the heaps are irrelevant. Only the count of valid elements in both heaps matter.

  • Keep a balance factor. It indicates three situations:

    • balance =0= 0: Both heaps are balanced or nearly balanced.
    • balance <0< 0: lo needs more valid elements. Elements from hi are moved to lo.
    • balance >0> 0: hi needs more valid elements. Elements from lo are moved to hi.
  • Inserting an incoming number in_num:

    • If in_num is less than or equal to the top element of lo, then it can be inserted to lo. However this unbalances hi (hi has lesser valid elements now). Hence balance is incremented.

    • Otherwise, in_num must be added to hi. Obviously, now lo is unbalanced. Hence balance is decremented.

  • Lazy removal of an outgoing number out_num:

    • If out_num is present in lo, then invalidating this occurrence will unbalance lo itself. Hence balance must be decremented.

    • If out_num is present in hi, then invalidating this occurrence will unbalance hi itself. Hence balance must be incremented.

    • We increment the count of this element in the hash_table table.

    • Once an invalid element reaches either of the heap tops, we remove them and decrement their counts in the hash_table table.

Complexity Analysis

  • Time complexity: O(2nlogk)+O(nk)O(nlogk)O(2 \cdot n \log k) + O(n-k) \approx O(n \log k).

    • Either (or sometimes both) of the heaps gets every element inserted into it at least once. Collectively each of those takes about O(logk)O(\log k) time. That is nn such insertions.
    • About (nk)(n-k) removals from the top of the heaps take place (the number of sliding window instances). Each of those takes about O(logk)O(\log k) time.
    • Hash table operations are assumed to take O(1)O(1) time each. This happens roughly the same number of times as removals from heaps take place.
  • Space complexity: O(k)+O(n)O(n)O(k) + O(n) \approx O(n) extra linear space.

    • The heaps collectively require O(k)O(k) space.
    • The hash table needs about O(nk)O(n-k) space.

Approach 3: Two Multisets

Intuition

One can see that multisets are a great way to keep elements sorted while providing efficient access to the first and last elements. Inserting and deleting arbitrary elements are also fairly efficient operations in a multiset. (Refer to Approach 4 Intuition for 295. Find Median From Data Stream)

Thus, if the previous approach gives you too much heartburn, consider replacing the use of priority_queue with multiset.

Algorithm

Inserting or deleting an element is straight-forward. Balancing the heaps takes the same route as Approach 3 of 295. Find Median From Data Stream.

Complexity Analysis

  • Time complexity: O((nk)6logk)O(nlogk)O((n-k) \cdot 6 \cdot \log k) \approx O(n \log k).

    • At worst, there are three set insertions and three set deletions from the start or end. Each of these takes about O(logk)O(\log k) time.
    • Finding the mean takes constant O(1)O(1) time since the start or ends of sets are directly accessible.
    • Each of these steps takes place about (nk)(n-k) times (the number of sliding window instances).
  • Space complexity: O(k)O(k) extra linear space to hold contents of the window.


Approach 4: Multiset and Two Pointers

Intuition

This is same as Approach 4 for 295. Find Median From Data Stream.

But, we don't actually need two pointers.

Median elements are derived using a single iterator position (when the window size kk is odd) or two consecutive iterator positions (when kk is even). Hence keeping track of only one pointer is sufficient. The other pointer can be implicitly derived when required.

Algorithm

  • A single iterator mid, which iterates over the window multiset. It is analogous to upper_median in Approach 4 for 295. Find Median From Data Stream. lower_median is implicitly derived from mid. It's either equal to mid (when the window size kk is odd) or prev(mid) [1] (when the window size kk is even).

  • We start with populating our multiset window with the first kk elements. We set mid to the k/2\lfloor k/2 \rfloor indexed element in window (0-based indexing; Multisets always maintain their sorted property).

  • While inserting an element num into window, three cases arise:

    1. num is less than the value of upper median mid.

    2. num is greater than the value of upper median mid.

    3. num is equal to the value of upper median mid. This situation is often handled as language-dependent. Since C++ multiset insert elements at the end of their equal range, this situation is essentially the same as the previous case.

    • For the first case, num is inserted before the upper median element mid. Thus mid now, no longer points to the k/2\lfloor k/2 \rfloor indexed element. In fact it points to the k/2+1\lfloor k/2 \rfloor + 1 indexed element. We fix that by decrementing mid.

    • For the second and third cases, num is inserted after the upper median element mid and hence does not spoil the mid iterator. It still points to the k/2\lfloor k/2 \rfloor indexed element.

    • Of course, the window size just increased to k+1k + 1 in all three cases. That will easily be fixed by removing the element that is about to exit the window.

  • While removing an element num, the same three cases arise as when we wanted to insert an element:

    1. num is less than the value of upper median mid.

    2. num is greater than the value of upper median mid.

    3. num is equal to the value of upper median mid. Since mid will point to the first occurrence of num in the multiset window and we deterministically remove the first occurrence (take note that we use std::multiset::lower_bound() [2] to find the correct occurrence to erase), this case is handled in the same manner as the first case.

    • In the first and third cases, removing num will spoil the mid iterator. Thus we need to fix that by incrementing mid before we remove that element.

    • For the second case, the mid iterator is not spoiled. No further action is required.

    • Once this element has been removed, the window size returns to being kk.

  • After insertion of the incoming element and removal of the outgoing element, we are left again with some nice invariants:

    1. Window size is again kk.
    2. The window is still fully sorted.
    3. mid still points to the k/2\lfloor k/2 \rfloor indexed element.
  • Finding the median of the window is easy! It is simply the mean of the elements pointed to by the two pointers lo_median and hi_median. In our case those are mid or prev(mid) (as decided by whether kk is odd or even) , and mid respectively.

[3]

Complexity Analysis

  • Time complexity: O((nk)logk)+O(k)O(nlogk)O((n-k) \log k) + O(k) \approx O(n \log k).

    • Initializing mid takes about O(k)O(k) time.
    • Inserting or deleting a number takes O(logk)O(\log k) time for a standard multiset scheme. [4]
    • Finding the mean takes constant O(1)O(1) time since the median elements are directly accessible from mid iterator.
    • The last two steps take place about (nk)(n-k) times (the number of sliding window instances).
  • Space complexity: O(k)O(k) extra linear space to hold contents of the window.


Further Thoughts

As noted before, this problem is essentially an extension to 295. Find Median From Data Stream. That problem had a lot of ways to go about, that frankly, are not of much use in an interview. But they are interesting to follow all the same. If you are interested take a look here. Try extending those methods to this problem.


  1. std::prev() is a C++ method to find the previous element to the current one being pointed to by an iterator. ↩︎

  2. Had we used std::multiset::find(), there was no guarantee that the first occurrence of num would be found. Although the contrary did happen in all of our tests, I don't recommend using it. Your mileage may vary. ↩︎

  3. Shout-out to @votrubac and @StefanPochmann! ↩︎

  4. Hinting can reduce that to amortized constant O(1)O(1) time. ↩︎

Report Article Issue

Comments: 17
ColinBin's avatar
Read More

For approach #3 with two heaps,
is it possible that all invalid numbers never occur at the top of the heap, thus are never actually removed from the heap? And since the actual heap size may be O(n), the time complexity may be O(n * log(n)). Consider 102, 101, 1, 2, 3, 100, 4, 99, 5, 98, 6, 97... with k =4.
Correct me if I'm wrong.

20
Show 4 replies
Reply
Share
Report
FelixGEEK's avatar
Read More

@StefanPochmann Hi, I remember in Java's PriorityQueue, remove(Object) will take O(k) time, so if we use this remove(Object) function, the total time complexity will be O(nk + nlogk), right?

7
Reply
Share
Report
ZoroDuncan's avatar
Read More

I think two heaps time complexity is O(N*K):
1.Inserting/removing numbers from heaps of size ‘K’. This will take O(logK)
2.Removing the element going out of the sliding window. This will take O(K) as we will be searching this element in the heap).

12
Show 3 replies
Reply
Share
Report
harleyquinn's avatar
Read More

since multiset is not available in java the last two approaches are really difficult there, you will have to implementing am AVL tree with deletion :(

2
Show 4 replies
Reply
Share
Report
tiagsters's avatar
Read More

I believe the time complexity for Approach 1 should be O(nk) not O(n k log k)

The insort step takes O(k + log k) not O(k log k). You're finding the element O(log k) then inserting it O(k).
https://docs.python.org/3/library/bisect.html#bisect.insort_left

1
Show 2 replies
Reply
Share
Report
yingfu's avatar
Read More

Simply put, so beautiful !!!!

1
Reply
Share
Report
rainmaker9001's avatar
Read More

For the two heaps approach, could we also use a "LinkedTwoHeaps" data structure (compare with LinkedHashMap in Java) in which we have two heaps, each of whose elements are linked list nodes that point to other nodes based on insertion order? As we move our sliding window, we would delete the first node, add a new node, and then rebalance the two heaps.

0
Reply
Share
Report
dril's avatar
Read More

"An immediate challenge at this point is balancing the heaps while keeping extraneous elements. This is done by actually moving some elements to the heap which has extraneous elements, from the other heap. This cancels out the effect of having extraneous elements and maintains the invariant that the heaps are balanced." This paragraph confuses people. Consider removing it.

0
Reply
Share
Report
Pythagoras_the_3rd's avatar
Read More

Essentially Solution 2 in Python if anyone is looking.

class Solution:
    def medianSlidingWindow(self, nums: List[int], k: int) -> List[float]:
        smalls = []
        bigs = nums[:k]
        heapq.heapify(bigs)
        heapq.heapify(smalls)
        while len(smalls) < len(bigs):
            heapq.heappush(smalls, -heapq.heappop(bigs))
        
        removals = collections.Counter()
        
        medians = []
        i = k-1
        while i < len(nums):
            medians.append((bigs[0]-smalls[0])/2.0 if k%2 == 0 else -smalls[0])
            i += 1
            if i == len(nums):
                break
            out_num = nums[i-k]
            in_num = nums[i]
            balance = 0
            balance += -1 if out_num <= -smalls[0] else 1
            removals[out_num] += 1

            if smalls and in_num <= -smalls[0]:
                balance += 1
                heapq.heappush(smalls, -in_num)
            else:
                balance -= 1
                heapq.heappush(bigs, in_num)
            
            if balance < 0:
                heapq.heappush(smalls, -heapq.heappop(bigs))
                balance += 1
            if balance > 0:
                heapq.heappush(bigs, -heapq.heappop(smalls))
                balance -= 1
                
            while smalls and removals[-smalls[0]]:
                removals[-smalls[0]] -= 1
                heapq.heappop(smalls)
                
            while bigs and removals[bigs[0]]:
                removals[bigs[0]] -= 1
                heapq.heappop(bigs)

        return medians

0
Reply
Share
Report
snehavarma's avatar
Read More

Approach one seems to be giving TLE.

0
Reply
Share
Report

You don't have any submissions yet.

/23
Contribute
|||
Saved
Trie
This list is empty.